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Given a set of several inputs into a system (e.g., independent variables characterizing stimuli) and a set of several stochastically non-independent 
outputs (e.g., random variables describing different aspects of responses), how can one determine, for each of the outputs, which of the inputs it 
is influenced by? The problem has applications ranging from modeling pairwise comparisons to reconstructing mental processing architectures 
to conjoint testing. A necessary and sufficient condition for a given pattern of selective influences is provided by the Joint Distribution Criterion, 
according to which the problem of "what influences what" is equivalent to that of the existence of a joint distribution for a certain set of random 
variables. For inputs and outputs with finite sets of values this criterion translates into a test of consistency of a certain system of linear equations 
and inequalities (Linear Feasibility Test) which can be performed by means of linear programming. While new in the behavioral context, both this 
test and the Joint Distribution Criterion on which it is based have been previously proposed in quantum physics, in dealing with generalizations of 
Bell inequalities for the quantum entanglement problem. The parallels between this problem and that of selective influences in behavioral sciences 
are established by observing that noncommuting measurements in quantum physics are mutually exclusive and can therefore be treated as different 
levels of one and the same factor. 
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architectures, random outputs, selective influences, quantum entanglement, Thurstonian scaling. 




1. INTRODUCTION 

This paper deals with diagrams of selective influences, like 
this one: 



(1) 



The Greek letters in this diagram represent inputs, or external 
factors, e.g., parameters of stimuli whose values can be cho- 
sen at will or observed and recorded. The capital Roman letters 
stand for random outputs characterizing reactions of the system 
(an observer, a group of observers, stock market, a set of pho- 
tons, etc.). The arrows show which factor influences which ran- 
dom output. The factors are treated as deterministic entities: 
even if a, (3,y, 8 in reality vary randomly (e.g., being randomly 
generated by a computer program, or being concomitant pa- 
rameters of observations, such as age of respondents), for the 
purposes of analyzing selective influences the random outputs 
A,B,C are always viewed as conditioned upon various combina- 
tions of specific values of a, p,y, 8. The first question to ask is: 
what is the meaning of the above diagram if the random outputs 
A,B,C in it are not necessarily stochastically independent? (If 
they are, the answer is of course trivial.) And once the meaning 
of the diagram of selective influences is established, how can 
one determine that this diagram correctly characterizes the de- 
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pendence of the joint distributions of the random outputs A,B,C 
on the external factors a, p,y, 8? 

These questions are important, because the assumption of 
stochastic independence of the outputs more often than not is 
either demonstrably false or adopted for expediency alone, with 
no other justification. At the same time the assumption of selec- 
tivity in causal relations between inputs and stochastic outputs is 
ubiquitous in theoretical modeling, often being built in the very 
language of the models. For instance, in Thurstone's most gen- 
eral model of pairwise comparisons (Thurstone, 1927) it is as- 
sumed that each of the two stimuli is mapped into "its" internal 
representation, while the two representations are stochastically 
interdependent random entities. In Dzhafarov (2003), Dzhafarov 
and Gluhovsky (2006), and Kujala and Dzhafarov (2008) the 
reader may find other motivating applications for the notion of 
selective influences: same-different comparisons, conjoint test- 
ing, parallel-serial networks of mental operations, response time 
decompositions, and all conceivable combinations of regression 
analysis and factor analysis. In this paper we add another moti- 
vating example, the quantum entanglement problem in quantum 
physics. 

This paper continues and expands the analysis of selective in- 
fluences presented in Dzhafarov and Kujala (2010). The famil- 
iarity with it can be helpful, but the main concepts, terminology, 
and notation are recapitulated in Section 2. Unlike in Dzha- 
farov and Kujala (2010), however, here we do not pursue the 
goal of maximal generality of formulations, focusing instead on 
the conceptual set-up that applies to commonly encountered ex- 
perimental designs. This means a finite number of factors, each 
having a finite number of values. It also means that the random 
outcomes influenced by these factors are random variables in 
the narrow sense of the word: their values are vectors of real 
numbers or elements of countable sets, rather than more com- 
plex structures, such as functions or sets. This is done primarily 



1 



2 



Dzhafarov and Kujala 



to simplify and shorten exposition, and also because the Linear 
Feasibility Test, a new (for behavioral sciences) application of 
the Joint Distribution Criterion on which we focus in this paper 
(Section 3), is confined to finite sets of finite-valued factors and 
finite- valued random variables. This also allows us to emphasize 
a simple but important and previously overlooked proposition, 
Theorem 2.3, which essentially says that, when dealing with ob- 
servable random variables, the unobservable random entities of 
the theory can also be assumed to be random variables (in the 
narrow sense). In another respect, however, the present treat- 
ment is more general than that in Dzhafarov and Kujala (2010): 
we allow for incomplete designs, those in which some but not 
necessarily all combinations of the values of the factors serve as 
allowable treatments. This modification is critical for the possi- 
bility of representing any diagram of selective influences, such 
as (1), in a canonical form, with every random output being se- 
lectively influenced by one and only one factor. 

As it turns out, both the Linear Feasibility Test and the 
Joint Distribution Criterion on which it is based have their ana- 
logues in quantum physics. 1 To appreciate the analogy, however, 
one has to adopt the interpretation of noncommuting quantum 
measurements performed on a given component of a quantum- 
entangled system as mutually exclusive factor levels of the same 
factor. In Sections 2.6 and 3 we discuss the parallels between 
the existence of a classical explanation for an entanglement sit- 
uation in quantum mechanics and the adherence of a behavioral 
experiment to a diagram of selective influences. 

The term "test" in this paper is used in the meaning of nec- 
essary (sometimes necessary and sufficient) conditions for dia- 
grams of selective influences. The usage is the same as when we 
speak of the tests for convergence in calculus or for divisibility 
in arithmetic. That is, the meaning of the term is non-statistical. 
We assume that random outputs are known on the population 
level. General considerations related to statistical tests based on 
our population level tests are discussed in Section 3.6, but spe- 
cific statistical issues are outside the scope of this paper. 



2. BASIC NOTIONS 

In this section, we establish the terminology, notation, and re- 
capitulate basic facts related to factors, random variables, and 
the dependence of the latter on the former. We follow Dzha- 
farov and Kujala (2010), adding observations related to the fac- 
torial designs being incomplete and random outputs being ran- 
dom variables in the narrow sense of the term. At the end of the 
section we discuss the parallels between the issue of selective 
influence in behavioral sciences and the quantum entanglement 
problem. 



We are grateful to Jerome Busemeyer of Indiana University who pointed out 
to us that the formulation of the Joint Distribution Criterion in our earlier work 
has the same formal structure as the identically titled criterion in Fine (1981a- 
b), in his analysis of quantum entanglement. 



2.1. Factors, factor points, treatments 

A factor a is treated as a set of factor points, each of which 
has the format "value (or level) x of factor a." In symbols, this 
can be presented as (x, 'a'), where 'a' is the unique name of 
the set a rather than the set itself. It is convenient to write x a 
in place of (x, 'a'). Thus, if a factor with the name 'intensity' 
has three levels, 'low, ' 'medium, ' and 'high, ' then this factor is 
taken to be the set 

intensity = {low in,ensity ,medium intemi,y ,high in,ensity } . 

There is no circularity here, for, say, the factor point 
low mtenstty stands for (value = low, name = 'intensity') rather 
than (value = low, set = intensity). 

We will deal with finite sets of factors <t> = {oci , . . . , a,,,}, with 
each factor ae$ consisting of a finite number of factor points, 

«=K,...,vy. 

Clearly, a n P = for any distinct a, p e <E>. 

A treatment, as usual, is defined as the set of factor points 
containing one factor point from each factor, 

<|>={*™ 1 ,...,**»} GOti x...xoc m . 

The set of treatments (used in an experiment or considered 
in a theory) is denoted by T C oci x . . . x oc„, and assumed to 
be nonempty. Note that T need not include all possible combi- 
nations of factor points. This is an important consideration in 
view of the "canonical rearrangement" described below. Also, 
incompletely crossed designs occur broadly — in an experiment 
because the entire set ai x . . . x a„, may be too large, or in a the- 
ory because certain combinations of factor points may be physi- 
cally or logically impossible (e.g., contrast and shape cannot be 
completely crossed if zero is one of the values for contrast). 

2.2. Random variables 

We assume the reader is familiar with the notion of a random 
entity (random variable in the general sense of the term) A as- 
sociated with an observation space (.#,£), where A is the set of 
possible values for A, and E a sigma-algebra (set of events) on 
A. A random variable (in the narrow sense) is a special case of 
a random entity, defined as follows: 

(i) if A is countable, E is the power set of A, then A is a 
random variable; 

(ii) if A is an interval of reals, E is the Lebesgue sigma-algebra 
on A , then A is a random variable; 

(iii) if A\ , . . . ,A n are random variables, then any jointly dis- 
tributed vector (Ay,. . . ,A n ) whose observation space is the con- 
ventionally understood product of the observations spaces for 
A\ , . . . ,A„ is a random variable. 

We use the relational symbol ~ in the meaning of "is dis- 
tributed as." A ~ B is well defined irrespective of whether A and 
B are jointly distributed. 

Let, for each treatment <|) £ T, there be a vector of jointly dis- 
tributed random variables A = (A; , . . . ,A„) with a fixed (product) 
observation space and the probability measure fi§ that depends 
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on (j). 2 Then we say that we have a vector of jointly distributed 
random variables that depends on treatment (j), and write 

a(4>) = (Ai,...,a„)(4>), 4>er. 

A correct way of thinking of A(<|)) is that it represents a set of 
vectors of jointly distributed random variables, each of these 
vectors being labeled (indexed) by a particular treatment. Any 
subvector of A ((])) should also be written with the argument (j), 
say, (Ai ,A2,A3) (§)■ If is explicated as (j) = {jc" 1 , . . . ,x" m }, we 
write A ((j)) = A (*">,..., 

It is important to note that for distinct treatments (j)i and (j)? 
the corresponding A ((j) i ) and A(§>2) do not possess a joint dis- 
tribution, they are stochastically unrelated. This is easy to un- 
derstand: since <|)i and §2 are mutually exclusive conditions for 
observing values of A, there is no non-arbitrary way of choos- 
ing which value a = (a\, . . . ,a„) observed at §\ should be paired 
with which value a' = (a\ , . . . ,a' n ) observed at (j)2. To consider 
A((|)i) and A((j)2) stochastically independent and to pair every 
possible value of A((|)i) with every possible value A((|)2) is as 
arbitrary as, say, to consider them positively correlated and to 
pair every quantile of A((|)i) with the corresponding quantile of 

2.3. Arrow diagrams, canonically (re)arranged 

Given a set of factors <t> = {ai , . . . , a,,,} and a vector A ((j)) = 
(Ai, . . . ,A„ )((])) °f random variables depending on treatment, an 
arrow diagram is a mapping 

ilf:{l,...,n}->2* (2) 

(2* being the set of subsets of <t>). Later, in Definition 2.1, the 
arrows will be interpreted as indicating selective influences, but 
for now this is unimportant. The set 

4>, =M(/), (i = l,. ..,«), 

is referred to as the subset of factors corresponding to A,-. It de- 
termines, for any treatment (j) £ T, the subtreatments (j)^ defined 

as 

0<j>,. = {x a e (j) : a e <!>,} , i=l,...,n. 

Subtreatments (j)^ across all (j) £ 7" can be viewed as admissible 
values of the subset of factors <t>, (z = 1, . . . ,«). Note that (j)^. is 
empty whenever <t>, is empty. 

The simplest arrow diagram is bijective, with correspon- 
dences 



0Ci 


a„ 


Ai 


A„ 



(3) 



We can simplify the subsequent discussion without sacrificing 
generality by agreeing to reduce each arrow diagram (in the con- 
text of selective influences) to a bijective form, by appropriately 
redefining factors and treatments. It is obvious how this should 
be done. Given the subsets of factors <t>i . . . ,<t>„ determined by 
an arrow diagram (2), each <t>, can be viewed as a factor identi- 
fied with the set of factor points 

a? = {(4>*,.) a *:<i>er}, 

in accordance with the notation we have adopted for factor 
points: ((j)*,)"' = (<|>*-, 'oc*'). If <!>, is empty, then (j)^. is empty 
too, and the factor a* consists of only the dummy factor point 
01 ' (where denotes the empty set). The set of treatments T 
for the original factors {ai , . . . , a,,,} should then be redefined for 
the vector of new factors (a* , . . . , OC*) as 

T* = {{(K) a ',---,(0*„) a "} : G r} c at x ... xa,;. 

We call this (re)definition of factor points, factors, and treat- 
ments the canonical (re)arrangement. We can say that the ran- 
dom variables following canonical (re)arrangement can be in- 
dexed by the corresponding factors. Thus, when convenient, 
we can write in (3) A^j in place of A\, A^j in place of A2, 
etc. The notation §$ >j = <j){ a ,} then indicates the singleton set 
{x a '} C (j). As usual, we write x a ' in place of {x a '}: 



2.4. The criterion 

Definition 2.1 (Selective influences, bijective form). An arrow 
diagram (3) is said to be the diagram of selective influences for 
[A\, , , , ,A ra )(<j)) and (<Xi,. . . ,a„), and we write 

(Ai,...,A„) (ai,...,a„), 

if, for some random entity R and for any treatment (j) = 

(Ai,... ,A„)(<1>) ~ (/1 (W*)-- • • ./»(♦{<%,}>*)) 

(4) 

= (/i(x« 1 ,*),. ..,/„(*?•,*)), 

where : a,- X — > Jli (i = 1, . . . , n) are some functions, with 
denoting the set of possible values of R? 

This definition is difficult to put to work, as it refers to an 
existence of a random entity (variable) R without showing how 
one can find it or prove that it cannot be found. The following 
criterion (necessary and sufficient condition) for (Ai, . . . ,A„) <-P 
(ai , . . . , a„ ) circumvents this problem. 



2 The convenient assumption of the invariance of the observation space for A 
with respect to $ is innocuous: one can always redefine the observation spaces 
for different treatments if to make them coincide. 



3 It will be shown below, Theorem 2.3, that random entity R can always be 
chosen to be a random variable (in the narrow sense). 
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Criterion 2.2 (Joint Distribution Criterion, JDC). A vector of 
random variables A(ty) = (A\, . . . ,A„)((j)) satisfies a diagram of 
selective influences (3) if and only if there is a vector of jointly 
distributed random variables 

t for cii forct„ \ 

H a x H ax,... ,H r a n ,H r a„ I , 

one random variable for each factor point of each factor, such 
that 

Ka 1 )'---' // V„})^ A W (5) 

for every treatment (j) £ T. 

See Dzhafarov and Kujala (2010) for a proof. The vector H in 
the formulation of the JDC is referred to as the J DC-vector for 
A((])), or the hypothetical JDC-vector for A((j)), if the existence 
of such a vector of jointly distributed variables is in question. 

The JDC prompts a simple justification for our definition of 
selective influences. Let, for example, (A,B,C) <-P (a,(3,y), with 

a= {l a ,2 a }, [3 = {lP,2P,3P},Y={l Y ,2*,3M*}. Consider all 
treatments (j) in which the factor point of a is fixed, say, at 1™. If 
(A,B,C) (a, P,y), then in the vectors of random variables 

(A,B,C) (l a ,2P, lA , (A,B,C) (l a ,2P,3 Y ) , (A,B,C) (\ a ,^, 1 Y ) 

the marginal distribution of the variable A is one and the same, 

A(l a ,2P,l^) ~A(l a ,2P,3^) ~A(l°\3l\l^) . 

But the intuition of selective influences requires more: that we 
can denote this variable A (l a ) because it preserves its identity 
(and not just its distribution) no matter what other variables it 
is paired with, (B,C) (2$, V), (B,C) (2P,3^), or (B,C) (3$, 
Analogous statements hold for A(2«), B(2^), B(3$), C(V), 
etc. The JDC formalizes the intuitive notion of variables "pre- 
serving their identity" when entering in various combinations 
with each other: there are jointly distributed random variables 

H\a,H2a,H^,H 2 ^,H^,Hii,Hxi-,HT,y,H4i 

whose identity is defined by this joint distribution; when H\a. 
is combined with random variables fl,p and Hrt, it forms the 
triad (Hia.,H 2 v,H\y) whose distribution is the same as that 
of (A,B,C)(l a ,2P,n); when the same random variable H\«. 
is combined with random variables H,p and Hyt, the triad 

(Hia,H 2f ,,H 3 i) is distributed as (A,B,C) (l a ,2^,3 y ); and so on 
— the key concept being that it is one and the same H\a which 
is being paired with other variables, as opposed to different ran- 
dom variables A (l a ,2P, Y 1 ) ,A (l a ,2P,3^) ,A (l a ,3^, H) which 
are identically distributed. See Dzhafarov and Kujala (2010) for 
a demonstration that the identity is not generally preserved if all 
we know is marginal selectivity (as defined in Section 2.5). 
The following is an important consequence of JDC. 

Theorem 2.3. In Definition 2.1, the random entity R can always 
be chosen to be a random variable. Moreover, R can be chosen 
arbitrarily, as any continuously (atotnlessly) distributed random 
variable, e.g., uniformly distributed between and 1. 



Proof. The first statement follows from the fact that R can be 
chosen to coincide with the JDC-vector H, so that 

for i= l,...,n, andx a ' G a,. The JDC-vector H is a random vari- 
able. The second statement follows from Theorem 1 in Dzha- 
farov & Gluhovsky, 2006, based on a general result for standard 
Borel spaces (e.g., in Kechris, 1995, p. 1 16). □ 



2.5. Three basic properties of selective influences 

For completeness, we list three other fundamental conse- 
quences of JDC (Dzhafarov & Kujala, 2010). 



2.5.1. Nestedness. 

For any subset {z'i,...,?^} of {1,. ..,«}, if (Ai,...,A„) t-p 
(ai,...,a„) then (A h , . . . ,A ( -J «-p (a h ,...,a ik ). 



2.5.2. Complete Marginal Selectivity 

For any subset {/i,...,^} of {1,...,«}, if (Ai,...,A„) t-p 
(oil, . . . ,a n ) then the A:-marginal distribution 4 of (A;-, , . . . ,Aj. )(<|)) 
does not depend on points of the factors outside (a,-, l ... ) Oj t ), 
In particular, the distribution of A, only depends on points of a, , 
i = l,...,n. 

This is, of course, a trivial consequence of the nestedness 
property, but its importance lies in that it provides the easiest 
to check necessary condition for selective influences. 



2.5.3. Invariance under factor-point-specific transformations 
Let (Ai,...,A„) «-P (ai,...,<X n ) and 

H = \ H x a i ,H X <H ,H x a„ , . . . ,H x a„ ^ 

be the JDC-vector for (Ai,. . . ,A„)((j)). Let F be any function 
that applies to H componentwise and produces a corresponding 
vector of random variables 



F(H) = 




4 ^-marginal distribution is the distribution of a subset of k random variables 
(k > 1) in a set of n > k variables. In Townsend and Schweickert (1989) the 
property was formulated for 1-marginals of a pair of random variables. The 
adjective "complete" we use with "marginal selectivity" is to emphasize that 
we deal with all possible marginals rather than with just 1-marginals. 



Selectivity in Probabilisitc Causality 



5 



where we denote by F (x a , •) the application of F to the compo- 
nent labeled by x a . Clearly, F (H) possesses a joint distribution 
and contains one component for each factor point. If we now de- 
fine a vector of random variables B ((j)) for every treatment (j) 6 T 

as 

(2?i, . . . ,B n ) (<|>) = (F (<fr {ai },Ai) , . . .,F(ty {an} ,A n )) ($) , 

then it follows from JDC that (B\ , . . . ,B n ) +-P (oci , . . . ,a„). 5 A 
function F(x a ', ■) can be referred to as a factor-point-specific 
transformation of the random variable A;, because the random 
variable is transformed differently for different points of the fac- 
tor assumed to selectively influence it. We can formulate the 
property in question by saying that a diagram of selective influ- 
ences is invariant under all factor-point-specific transformations 
of the random variables. Note that this includes as a special case 
transformations which are not factor-point-specific, with 

F(x^)=...=F(x«-)=F(a l ,-). 

This property is important for construction and use of tests for 
selective influences (Dzhafarov & Kujala, 2010; Kujala & Dzha- 
farov, 2008). 



2.6. Quantum entanglement and selective influences 

In psychology, the notion of selective influences was intro- 
duced by Sternberg (1969), in the context of studying "stages" 
of information processing. Sternberg acknowledged that selec- 
tive influences can hold even if the durations of the stages being 
selectively affected are not stochastically independent, but he 
lacked the mathematical apparatus for dealing with this possi- 
bility. Townsend (1984) was the first to study the notion of se- 
lectiveness under stochastic interdependence systematically. He 
proposed to formalize the notion of selectively influenced and 
stochastically interdependent random variables by the concept 
of "indirect nonselectiveness": the conditional distribution of 
the variable A\ given any value ax of the variable A2, depends 
on 0C1 only, and, by symmetry, the conditional distribution of A2 
at any A\ = a\ depends on 0,2 only. Under the name of "con- 
ditionally selective influence" this notion was mathematically 
characterized and generalized in Dzhafarov (1999). It turned 
out, however, that this notion could not serve as a general def- 
inition of selective influences, because it did not satisfy some 
intuitive desiderata for such a definition, e.g., the nestedness 
and marginal selectivity properties formulated in Section 2.5. 
Variants of Definition 2.1 of the present paper were proposed in 
Dzhafarov (2003) and both elaborated and generalized in Dzha- 
farov and Gluhovsky (2006), Kujala and Dzhafarov (2008); JDC 
was explicitly formulated in Dzhafarov and Kujala (2010), al- 
though clearly implied in the earlier work. 



5 Since it is possible that F (x a ,H x a) and F (y a ,H y a), with 1° / y a , have dif- 
ferent sets of possible values, strictly speaking, one may need to redefine the 
functions to ensure that the sets of possible values for B ((|)) is the same for 
different if. This is, however, not essential (see footnote 2). 



Until very recently (see footnote 1) we were blissfully un- 
aware of the analogous developments in quantum physics. The 
most conspicuous parallels can be found in Fine (1981a-b), but 
that work in turn builds on a venerable line of research and think- 
ing: going back first to Bell (1964), and ultimately to Einstein, 
Podolsky, and Rosen's (1935) paper. The issue in question re- 
gards two "noncommuting" measurements, such as those of the 
momentum and of the location of a particle, or spin measure- 
ments along two different axes. For our purposes it is suffi- 
cient to state that when one of two noncommuting measurements 
is performed (without uncertainty about the result), the second 
one cannot be performed on the same system. The key insight 
needed to understand the analogy with the problem of selective 
influences is this: noncommuting measurements on the same sys- 
tem, being mutually exclusive, can be viewed as levels ( mutually 
exclusive values) of one and the same external factor. 

This is not entirely intuitive. Consider two particles for each 
of which one can measure its momentum or its location. The 
analogy requires that one view the measurement on particle 1 
as a factor (Xi with two mutually exclusive levels, l" 1 (location 
measurement) and 2" 1 (momentum measurement); and the mea- 
surement on particle 2 is a factor 0C2 with two mutually exclusive 
levels, l" 2 and 2 ai , interpreted analogously. The two measure- 
ments can be combined in treatments, (l™ 1 ,!™ 2 ), (l" 1 ,2 a2 ), etc., 
but not within a factor, (l a ',2 ai ) or (l a2 ,2 a2 ). The results of 
each of the measurements is a random variable, A\ for particle 
1 and A2 for particle 2. The possible values &\ for Ai are pos- 
sible locations of particle 1 if ai is at level l™ 1 , but they are 
possible momentum values for particle 1 if ai is at level 2™ 1 
(which makes it awkward but still possible to maintain the con- 
vention mentioned in footnote 2). It is easier with spins (Bohm 
& Aharonov, 1957): for instance, for spin-i/2 particles (such as 
electrons), j? 1 consists of two possible values of spin in one di- 
rection if 0C1 is at level l" 1 and of two possible values of spin in 
another direction if the level is 2 a ' . These two two-element sets 
are more natural to consider "the same." 

With all this in mind, the question now can be posed in the 
familiar to us form: can we say that (A\^At) <-P (ai,0C2), or 
can the measurement (factor) cti influence the result (random 
variable) A2 and/or 0C2 influence A 1 ? In the Einstein-Podolsky- 
Rosen (EPR) paradigm involving entangled particles, the two 
random outcomes Ai,A2 are stochastically interdependent, and 
their joint distribution at every treatment is (correctly) predicted 
by the quantum theory. The question therefore becomes: are the 
predicted (and observed) joint distributions of (Ai,A2) compati- 
ble with the hypothesis (A\ ,Az) <-P (0C1 , 0C2)? Einstein, Podolsky, 
and Rosen (1935) took {A\,A2} *-P (0Ci,(X2) for granted if the two 
particles are separated in space and measured simultaneously (in 
some inertial frame of reference). 

Bell's (1964) celebrated theorem shows that (Ai,A2) <-P 
(ai,0C2) is not the case for entangled spin-i/2 particles obeying 
the laws of quantum mechanics. The reason this result is con- 
sidered to be of foundational importance ("the most profound 
discovery in science," repeating the oft-quoted characterization 
byStapp, 1975) is that Bell essentially adopted Definition 2.1 for 
(Ai,A2) <-P (0Ci,0C2) and identified the random entity R with the 
set of all hidden variables of a conceivable theory "explaining" 
the dependence of (Ai,A2) on (ai,a2): knowing a value of R 
one would be able to predict, through the functions f\ and fi of 
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Definition 2.1, the values of (A\,A2). In addition to being called 
"hidden" the variables entailed in R are referred to as "context- 
independent" (meaning that the distribution of R and the func- 
tions f\ ,/a do not depend on treatments) and "local" (meaning, 
essentially, that in the theory involving R and f\ ,fi the measure- 
ment 0C1 does not influence A2, nor 0C2 influences Ai). Bell's 
(1964) theorem therefore is interpreted as stating that quantum 
predictions regarding two entangled spin-!/2 particles cannot be 
explained by any theory involving context-independent and local 
variables. 

A rejection of (A\ 1 A2) <-P (ai,a2) in quantum physics can 
be handled by dispensing with locality (Bohm's approach), but 
most physicists find this untenable (measurement 0C1 cannot in- 
fluence A2 if they are separated by a space-like interval). The 
quantum probability theory can be viewed as a way of allowing 
for context-dependence while retaining locality. In behavioral 
applications both locality and context-independence can be tar- 



geted when (A\,A2) <-P (ai,0C2) is rejected, and distinguishing 
the two is a challenge. 

Following the logic of Bell's work, Clauser, Home, Shimony, 
& Holt (1969) derived a system of inequalities that are neces- 
sary conditions for (A\,A2) <-P (0Ci,0C2) in the EPR paradigm 
with two particles and two measurements (factors) with binary 
outcomes. These inequalities are subsumed in Fine's (1982a-b) 
ones (discussed in Section 3.5), which present both necessary 
and sufficient conditions for (Ai,A2) (oci,OC2), based on JDC. 
The latter was introduced in Fine's papers for the first time (and 
called by this name too), although the earlier Suppes and Zan- 
otti's (1981) Theorem on Common Causes can also be viewed 
as a special form of JDC. 

Fine's inequalities form a special case of the Linear Feasibility 
Test considered in the next section. We therefore defer further 
discussion of the EPR paradigm to Section 3.5, and conclude the 
present section by the following table of correspondences: 



J 



Selective Probabilistic Causality 


Quantum Entanglement Problem (for spins) 


observed random output 


detected spin value of a given particle 


factor/input 


spin measurement in a given particle 


factor level 


setting (axis) of the spin measurement 


joint distribution criterion 


joint distribution criterion 


canonical diagram of selective influences 


"classical" explanation (by context-independent local variables) 



3. LINEAR FEASIBILITY TEST 

In this section we assume that for each random variable A,- ((f)) 
in (A\, . . . ,A„)(ty) the set A[ of its possible values has nij el- 
ements, a\, . . . ,a' m .. It is arguably the most important special 
case both because it is ubiquitous and because in all other cases 
random variables can be discretized into finite number of cate- 
gories. We are interested in establishing the truth or falsity of 
the diagram of selective influences (3), where each factor aj in 
(0C1 , . . . , a„) contains kj factor points x[,...,x{, (written so in- 

stead of more formal x l J , . . . ,x k J ). The Linear Feasibility Test 
(LFT) to be described is a direct application of JDC to this sit- 
uation, furnishing a necessary and sufficient condition for the 
diagram of selective influences {A\,. . . ,A n ) «-p (0C1,. . . ,OL n ). 



3.1. The test 



In the hypothetical JDC-vector 



we know that the set of possible values for the random variable 
H x i is {a\,. . . ,a' m .}, irrespective of j. Denote 



Pr 



A \ — ct^ , . . . ,A n — 



«=<„)(4>->4) 



for r.v.s for factor points^ 

= p I h,---,i n ; Th^Ti I , 



(6) 



where /,■ G {1, . . . ,m,} and jj G {1, . . . for i = l,...,n ("r.v.s" 
abbreviates "random variables"). Denote 



Pr 



H\ = a) ,H\ = a) , 
x \ '11' ' % hk, ' 



H^=a'I =a n , 

x l 'nl ' ' x k„ l„k„ 



(7) 



for A 1 for A n 

= Q I hu---J\k l i---JnU---Jnk n I 3 



H=[H x[> ...,H 4 H^,...,H 4n ), 



since we assume that, for any point Xj of factor a 1 and any treat- 
ment (j) containing x'j, 



where /,/ G {1, . . . , m; } for ; = !,...,«. This gives us 



to, 1 x ... x 



m ," 2-probabilities. A required joint distribution for the JDC- 
vector H exists if and only if these probabilities can be found 
subject to OT| x ... x f7i„" nonnegativity constraints 



Q(hl, ■ ■ ■ ,hkii ■ ■ ■ Jnl, ■ ■ ■ ,kik n ) > 0, 



(8) 
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and (denoting by nj the number of treatments in T) nj x mi x 
... x m„ linear equations 

(9) 

= P(h,..., / n ;/i,..., _/'„), 

where the summation is across all possible values of the 
(^11,---, hki , ■ ■ ■ , hi, ■ ■ ■ , Ink,, ) subject to 

hji =h,--- ,lnj„ = In- 6 

This can be more compactly formulated in a matrix form. 
Let the observable probabilities P(h, . . . ,l n ',j\ , ■ ■ ■ ,jn) consti- 
tute components of a nj x m\ x . . . x m n -dimensional col- 
umn vector P, with its cells lexicographically enumerated 
by (l\ ,...,/„; ji, ... ,j„). Let the hypothetical probabili- 
ties Q [In , . . . , /ijj , . . . , Z„i , . . . , /„£„ ) constitute components of a 

m* 1 x . . . x mj;" -dimensional column vector Q, with its cells lex- 
icographically enumerated by (In,..., li^ , . . . , l„\ , . . . , l„k„ ) . Let 
M be a Boolean matrix with nj x m\ x . . . x m n rows and m, 1 x 
... x m^' columns lexicographically enumerated in the same way 
as, respectively, P and Q, such that the entry in the cell in the 
(h,...,l„;ji,...,j„)th row and (/ii,...,/ Ul ,...,Z„i,...,/ n(fe Jth 
column is 1 if l\j ] — li,...,l njll = /„; otherwise the entry is 0. 
Clearly, the vector Q exists if and only if the system 

MQ = P, Q>0 (10) 

(with the inequality understood componentwise) has a solution. 
This is a typical linear programming (LP) problem. More pre- 
cisely, this is an LP task in the standard form and with a dummy 
objective function (e.g., a linear combination with zero coeffi- 
cients). It is known (Karmarkar, 1984; Khachiyan, 1979) that it 
is always possible, in polynomial time, to either find a solution 
for such a system or to determine that it does not exist. Many 
standard software packages can handle this problem (e.g., GNU 
Linear Programming Kit at http://www.gnu.org/software/glpk/). 

3.2. Properties of the LP problem 

The rank of matrix M is always strictly smaller than the num- 
ber of components in P. This follows from the fact that for any 
two allowable treatments (j\ ,j„) and (j\ , . . . ,j' n ) that share 
a subvector 

C/V, • ••,;» = {j[>, ■■■,//) 

(where we use {!/, . . . to designate s distinct elements chosen 
from {1,...,«}), and for any fixed (\'i,...,v s ), the sum of all 
rows of M corresponding to (Zi,... ,/„; ji,.. . ,y n )th components 
of P with . . . = (vi, . . . ,v s ) is the same Boolean vector as 
the sum of all rows of M corresponding to (/i, . . . ,/„; j[, . . . , y',)th 
components of P with the same property. The upper limit for the 
rank of matrix M is given in the following theorem. 



6 The sum of all Q's is 1 because it equals the sum of all P's (across all l\ , . . . , l n ) 
for any given treatment j\ ,...,/„ . 



Theorem 3.1. The rank of M for a maximal set of treatments 
T = cti x ...xa„ is 

{h {mi-\) + \)...{k„ (m„-l) + l). 

Proof. Given any 

{1',.../}C{1,...,«}, 

[jv, ■■■,;» e {l,...,^/} x ... x 

(l v ,...,l^) e {l,...,mi/} x ... x {l,...,m s /}, 

let V(1',.--j'$ / ;A'j---)7V^i')---j's') denote an 
(m\ ) k[ ... (m n ) k " -component Boolean row vector whose 
components are lexicographically enumerated in the same 
way as Q, and such that its (In, . . . ,l ikl , . . . ,/ Bl , . . . ,/„* n )th 
component is 1 if and only if 

h'j v —h',-- -Js'j s , = Is 1 - 

The rows of matrix M are Y(l,...,n;ji,...,j n ;h,...,l n )- 
vectors. It is easy to check that for any fixed 
(V , . . . ,s';jii , . . . ,j s i), the sum of the rows of M corresponding 
to fixed values (/j/, . . . is V(l', . . . ,s';ji/,... jj/'Ji',- ■ ■ ,l s ')- 
It follows that for s = n,n — 1,...,1, a vector 
V (V , . . . ,s';jir , . . . ,j s t;lit , . . . ,l s t) in which all = 1 ex- 
cept for i' e {1", . . . , v"} C {1', . . . (a subset of v < s distinct 
elements), is a linear combination of the vector 

V (1", . . . , v";ji", . . . , j v ";/i", . • • ,/ v ") 

and all the vectors 

V (l',..., /;/!/,..., ;V;/i',...,/ s ') 
for which all If > 1 and 

{ji",---Jv"ih",---,K"} c {ji>,---,j s >;hi,...,l s '}- 

As a result the rows of M are linear combinations of the rows of 
M* consisting of vectors 

V {!',■■■, s';ji',...,js>;h',---,l s ') 

for all possible 

{1',.../}C{1,...,«}, 

Ui>, •••,;» e {l,...,^/} x ... x {l,...,^}, 

(/!/,...,//) € {2,...,m v } x ... x {2,...,m s j}. 

By straightforward combinatorics the number of such vectors is 

(h (mi-l) + l)...(k n (m n -l) + l). 

The rows of M* are linearly independent 
because the column corresponding to the 
(/n = l,...,h ki = 1, — = 1, = l)th component 
of Q contains a single 1, in the row of M* corresponding to 
s = (which row contains l's only). □ 
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1 < rrt 



Note that 

ki(mi- I] 

for all kj > 2 and nij > 1 . This means that 

(h {mi -l) + l)...(k„ (m„ - 1) + 1) < (mi) kl . . . 
and the system MQ = P is always underdetermined. 



(m n ) kn , 



Corollary 3.2. If P satisfies marginal selectivity, then system 
( 10) is equivalent to 



M*Q = P*, Q>0, 



(11) 



where M* is as defined in the proof above, and P* is the "re- 
duced hierarchical" vector with components 



Pr 



where s = 0, . . . ,«, {1', . . . ,s'} C {1, . . . ,«}, a«c/ /;/ €{2, . 
for each i' S {!',... ,*'}. M* is of full row rank. 



(12) 



To comment on this corollary, it follows from the proof of 
Theorem 3.1 that MQ = P never has a solution if vector P vio- 
lates the equality 



LPr 
= £Pr 



At =a 



A, =a 



An — a'l 
■■An=a'l 



where the summation is across all values of (Zi,...,Z n ) with a 
fixed (Zj/, . . . ,Zy). Clearly, this necessary condition is just an- 
other way of stating marginal selectivity. Assuming that P does 
satisfy marginal selectivity, it can be represented by the "reduced 
hierarchical" vector P* whose components are marginal proba- 
bilities of all orders, with s = corresponding to the probability 
1. 



3.3. Examples 

Example 3.3. Let a = {l a ,2 a }, p = {l p ,2P}, and the set of 
allowable treatments T consist of all four possible combinations 
of the factor points. Let A and B be Bernoulli variables, a\ = 
b\ = \,ai = b% = 2, distributed as shown: 



a p 


A B 


Pr 


1 1 


1 1 


.140 




1 2 


.360 




2 1 


.360 




2 2 


.140 


a p 


A B 


Pr 


2 1 


1 1 


.189 




1 2 


.311 




2 1 


.311 




2 2 


.189 



a p 


A B 


Pr 


1 2 


1 1 


.198 




1 2 


.302 




2 1 


.302 




2 2 


.198 


a p 


A B 


Pr 


2 2 


1 1 


.460 




1 2 


.040 




2 1 


.040 




2 2 


.460 



Marginal selectivity here is satisfied trivially: all marginal prob- 
abilities are equal 0.5, for all treatments. In the matrix form of 
the LFT, the column-vector of the above 16 probabilities, 

(.140, .360, .360, . . . , .040, .040, .460) T , 

using T for transposition, is denoted by P. The LFT problem 
is defined by the system MQ = P, Q > 0, where the 16 x 16 
Boolean matrix M is shown below: each column of the ma- 
trix corresponds to a combination of values for the hypotheti- 
cal //-variables (shown above the matrix), while each row corre- 
sponds to a combination of a treatment with values of the outputs 
A,B (shown on the left). 









1 


1 


1 


1 


1 


1 


1 


1 


22222222 






H Z a. 


1 


1 


1 


1 


2 2 2 2 


1 


1 


1 


1 


2 2 2 2 






H 


[P 


1 


1 


2 2 


1 


1 


2 2 


1 


1 


2 2 


112 2 






H 




1 


2 


1 


2 


1 


2 


1 


2 


1 


2 


1 


2 


12 12 


a P 


A B 




























1 1 


1 


1 




1 


1 








1 


1 

























1 


2 










1 


1 








1 


1 



















2 


1 




























1 


1 








110 




2 


2 


































1 


1 


11 


1 2 


1 


1 




1 





1 





1 





1 






















1 


2 







1 





1 





1 





1 



















2 


1 




























1 





1 





10 10 




2 


2 































1 





1 


10 1 


2 1 


1 


1 




1 


1 




















1 


1 













1 


2 










1 


1 




















1 


1 







2 


1 
















1 


1 




















110 




2 


2 






















1 


1 














11 


2 2 


1 


1 




1 





1 

















1 





1 










1 


2 







1 





1 

















1 





1 







2 


1 
















1 





1 

















10 10 




2 


2 



















1 





1 














10 1 



The linear programing routine of Mathematica™(using the 
interior point algorithm) shows that the linear equations (9) have 
nonnegative solutions corresponding to the JDC-vector 



H\a H2<* Z/jp 


H 2f> 


Pr 


1 1 


1 


1 


.02708610 


1 1 


1 


2 


.00239295 


1 1 


2 


1 


.16689300 


1 1 


2 


2 


.03358610 


1 2 


1 


1 


.00197965 


1 2 


1 


2 


.10854100 


1 2 


2 


1 


.00204128 


1 2 


2 


2 


.15748000 







% 


Pr 


2 


1 


1 


1 


.15748000 


2 


1 


1 


2 


.00204128 


2 


1 


2 


1 


.10854100 


2 


1 


2 


2 


.00197965 


2 


2 


1 


1 


.03358610 


2 


2 


1 


2 


.16689300 


2 


2 


2 


1 


.00239295 


2 


2 


2 


2 


.02708610 



The column-vector of these probabilities constitutes Q > 0. This 
proves that in this case we do have (A,B) <-P (a, P). □ 
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Example 3.4. In the previous example, let us change the distri- 
butions of (A,B) to the following: 



a p 


A B 


Pr 




a p 


A B 


Pr 


1 1 


1 1 


.450 




1 2 


1 1 


.105 




1 2 


.050 




1 2 


.395 




2 1 


.050 




2 1 


.395 




2 2 


.450 




2 2 


.105 



a p 


A B 


Pr 


2 1 


1 1 


.170 




1 2 


.330 




2 1 


.330 




2 2 


.170 



a p 


A B 


Pr 


2 2 


1 1 


.110 




1 2 


.390 




2 1 


.390 




2 2 


.110 



Once again, marginal selectivity is satisfied trivially, as all 
marginal probabilities are 0.5, for all treatments. The linear 
programing routine of Mathematica™, however, shows that the 
linear equations (9) have no nonnegative solutions. This ex- 
cludes the existence of a JDC-vector for this situations, ruling 
out thereby the possibility of (A,B) (a, p). □ 



3.4. Renaming and grouping 

Since LFT is both a necessary and sufficient condition for se- 
lective influences, if it is passed for (Ai,... ,A„)((j)), it is guar- 
anteed to be passed following any factor-point-specific transfor- 
mations of these random outputs. All such transformations in 
the case of discrete random variables can be described as com- 
binations of renaming (factor-point specific one) and coarsening 
(grouping of some values together). In fact, the outcome of LFT 
simply does not depend on the values of the random variables 
involved, only their probabilities matter. Therefore a renaming 
will not change anything in the system of linear equations and 
inequalities (8)-(9). An example of coarsening will be redefin- 
ing A and B, each having possible values 1,2,3,4, into binary 
variables 



if A (4>) 



1,2, 
3,4, 



5*1 



iffl(4>) = 1,2,3, 
if S(<()) =4. 



It is clear that any such a redefinition amounts to replacing some 
of the equations in (9) with their sums. Therefore, if the original 
system has a solution, so will also the system after such replace- 
ments. Of course, the reverse is not generally true: the coarser 
system can have solutions when the original system does not. 

The same is true for coarsening the system by grouping to- 
gether some of the factor points within factors. Suppose we 
want to group together points x\ and x\ of factor 0Ci contain- 
ing more than two points. This means that the probabilities 



P (h , h, •■■,ln, ji , h, ■■■ Jn) are redefined as 7 

P' {hihi ■ ■ ■ ,ln\ jli hi ■ ■ ■ i jn) 

\P{hihi---iln\^ihi---ijn) + \P{hihi---iln','2-,hi---iin) 
if Ji = l, 

P{h,h,...,ln,h + ^ihi---ijn) 
I if 7i >!■ 

When we average the original equations for 

P (h, h, •••,/«; 1, 72, ■■■Jn) and P{h,h,...,l n \2,h,---,jn), 
we get 

£_J j'Ll n Q(hl =hihZi---ihk\i---Jnli---,lnk n ) I 
( +jD/iiQ('ll,'l2 = h---, hkii---i lnli---ihtk n ) J 
= P I (h,h,---,lnJih,---Jn), 

where hj 2 = hi-- - Jnj„ = hi and the outer summation is across 
all lij except for the following values for (1,1), (1,2), and 
i = 2,...,n. We define a new vector Q' whose dimension- 
ality is less than that of Q by one, putting 



Q' (hi = I, hi, ■ ■ ■ Ah i • • • , Inli ■ ■ ■ i hk,, ) 

= 2^/12 2 Cll = hhZiltt, - ■ ■ ,hkn - ■ ■ ,lnli - ■ ■ ilnk„) 

+2E/11 Q(hl,ll2 =l,ll3i-- -ihki , • • • Jnl, ■ ■ -Jnk„)i 



where / has the same range as any of hj, (For notational sim- 
plicity, in Q' we do not re-enumerate (1,3) as (1,2), (1,4) as 
(1,3), etc., leaving thereby In undefined) 

For any point of factor ai other than x\ and x\, say, x\, we 
have then 



E/ll,/l2 Q (hl,h2, ■ ■ ■ Mki , • • • Jnli • • • , lnk n ) 
= P(h,h-- -Jn'^ih ■ ■ -Jn) , 



which can be presented as 



11/ 



2"L/i 2 !2(^ll = hhlJn =h,-- -ihki , ■ ■ ■ Jnl, ■ ■ -Jnk„) 

+2&11 Q (hi, In = l,ln = h,---Mk l ,---Jn\,---,hk„) 

= p (h,l 2 ■ ■ ■ ,l n ;3,h - ■ ■ Jn) 



This is equivalent to 

Y<Q (hljl3 =hi-- -Akn- • • ,'nl, • • -Jnk,,) 

= p'(h,h...,l n ;h =2,72. ..Jn), 

where lij 2 = fc, . . . , l n j„ = In, and the summation is across all l,j 
except for (ij) = (1,3) and (ij) = (ijt), i = 2,... ,«. So we 
have obtained a solution for the factor-coarsened system from a 
solution for the original system. 



7 More general mixtures, JtP (Ji , 1%, . . . , l n \ 1, 72 , . . . ,j„) + 

(1 — 7t) P (Zi , I2, . . . , l„; 2, }2, . ■ ■ , j n ) for < 7C < 1 , are dealt with as eas- 
ily; moreover, 71 = 1 formally corresponds to dropping the factor point xk, 
considered below. The values of JC other than 1/2 and 1 can be useful if the 
grouping is done on a sample level, to reflect the differences in sample sizes 

x\ and x\. 
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Dropping a point, say, x\ is even simpler: we delete all rows 
with j\ = 2, and then redefine the Q vector as 

Q' (h 1 Jl3 , ■ ■ ■ j 'lip • • • , hi\ , • • • , Ink,, ) 

= L/ 12 Q {hl,hl,h3,- ■ -,hk\ Jnl, - ■ ■ Jnk„) ■ 

If the random variables involved have more than finite number 
of values and/or the factors consist of more than finite number of 
factor points, or if these numbers, though finite, are too large to 
handle the ensuing linear programming problem, then LFT can 
still be used after the values of the random variables and/or fac- 
tors have been appropriately grouped. LFT then becomes only a 
necessary condition for selective influences (with respect to the 
original system of factors and random variables), and its results 
will generally be different for different (non-nested) groupings. 

Example 3.5. Consider the hypothesis (A,B) <-P (a, p) with the 
factors having a finite number of factor points each, and A and B 
being response times. To use LFT, one can transform the random 
variable A as, say, 

{1 if A{<if) <a 1/4 (<|>), 
2 ifai/ 4 (4>)<A(40<a 1/2 (40, 
3 ifai/ 2 (4>)<A(40<a 3/ 4(40, 
4 if Aft) >a 3 /4($), 

and transform B as 

|2 ifBQ)>b l/2 Q), 

where ci p ((j)) and b p ((j)) designate the pth quantiles of, respec- 
tively A ((])) and B(§). The initial hypothesis now is reformu- 
lated as (A*,B*) «-p (oc,P), with the understanding that if it is 
rejected then the initial hypothesis will be rejected too (a neces- 
sary condition only). LFT will now be applied to distributions 
of the form 



a p 


A B 


Pr 


x y 


1 1 


Pii 




1 2 


pu 




4 1 


P\\ 




4 2 


P42 



where the marginals for A are constrained to 0.25 and the 
marginals for B to 0.5, for all treatments {x ol ,y' 5 }, yielding a 
trivial compliance with marginal selectivity. Note that the test 
may very well uphold (A*,B*) <-P (a, p) even if marginal selec- 
tivity is violated for (A,B)((j)) (e.g., if the quantiles a p (x a ,y$) 
change as a function of y$). □ 

3.5. Quantum entanglement 

Fine's (1982a-b) inequalities relate to the simplest EPR 
paradigm, with the number of particles n = 2, number of spin 



axes per particle k\ = k 2 = 2, and the number of possible spin 
values per particle m\ = m 2 = 2 (this value being the same for 
all spin axes chosen for a given particle). They can be written, 
with reference to (6) and (12), as 

-1 <P(2,2;; 1 ,j 2 )+P(2,2;/ 1 ,; 2 ) 
+ P(2,2;/ 1 ,$-P(2,2;;i > / 2 ) 
-P 1 *(2;/ 1 )-P 2 *(2;; 2 )<0, 

where E {1,2}, j 2 ,j' 2 G {1,2}, j 1 ^ j[, j 2 ^ f 2 . These in- 
equalities constitute the necessary and sufficient conditions for 
(Ai,A2) +-P (tt\,a 2 ), with marginal selectivity assumed implic- 
itly. Although Fine's derivation of these inequalities is different, 
they can be derived as solutions of system (11), with P* the 9- 
component vector (using T for transposition) 

(1,P 1 *(2;1),..., J P|(2;2), J P(2,2;1,1),..., J P(2,2;2,2)) T , 
Q the 16-component vector 

(e(i,i,i,i),...,e(2,2,2,2)) T , 

and M* the corresponding 9x16 Boolean matrix, 









1111111122222222 








1111222211112222 






H 


LP 


1122112211221122 






H 




1212121212121212 


a p 


A B 














1111111111111111 


1 • 


2 






0000000011111111 


2 • 


2 






0000111100001111 


• 1 




2 




0011001100110011 


• 2 




2 




0101010101010101 


1 1 


2 


2 




00000000001 1001 1 


1 2 


2 


2 




0000000001010101 


2 1 


2 


2 




0000001 10000001 1 


2 2 


2 


2 




0000010100000101 



In fact, using a standard facet enumeration program (e.g., 
Irs program at http://cgm.cs.mcgill.ca/~avis/C/lrs.html) these in- 
equalities (together with the equalities representing marginal se- 
lectivity) can be derived "mechanically." The essence of the 
computation is in the fact that a linear system (10) or (11) is 
feasible if and only if the point P (respectively, P*) belongs to 
the convex hull of the points corresponding to the columns of M 
(respectively, M*), which form a subset of the vertices of a unit 
hypercube. The facet enumeration programs derive inequalities 
describing this convex hull. 

Given a set of numerical (experimentally estimated or theo- 
retical) probabilities, computing the LP problem (10) or (11) is 
always preferable to dealing with explicit inequalities as their 
number becomes very large even for moderate-size vectors P. 
While Fine's inequalities for n = 2, k\ = k 2 = 2, m\ = m 2 = 2 
(assuming marginal selectivity) number just 8, already forn = 2, 
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k\ = k2 =2 with ni\ = ni2 = 3 (describing, e.g., an EPR exper- 
iment with two spin-1 particles, or two spin- 1/2 ones and ineffi- 
cient detectors), our computations yield 1080 inequalitiies, and 
for n = 3, k\ = £2 = £3 = 2 and m\ = ni2 = /M3 = 2, correspond- 
ing to the Greenberger, Home, & Zeilinger (1989) paradigm 
with three spin-i/2 particles, this number is 53792. 

The potential of JDC to lead to LFT and provide an ulti- 
mate criterion for the entanglement problem has not been uti- 
lized in quantum physics until relatively recently, when LFT was 
proposed in Werner & Wolf (2001a, b) and Basoalto & Perci- 
val (2003). Prior to this, criteria (as opposed to just necessary 
conditions) for the possibility of a classical explanation for an 
EPR paradigm involving multiple particles, multiple measure- 
ment settings, and multiple outcomes per measurements were 
only known under strong symmetry constraints (de Barros & 
Suppes, 2001; Garg, 1983; Mermin, 1990; Peres, 1999). 



3.6. Sample-level tests 

Although this paper is not concerned with statistical ques- 
tions, it may be useful to mention some of the approaches 
to constructing sample-level tests based on LFT. As men- 
tioned in Section 3.5, the set of our vectors P for which 
the system MQ = P, Q > has a solution forms a con- 
vex poly tope. In particular, if the set T of allowable treat- 
ments contains all combinations of factors points, the polytope 
is the ((&i (mi — 1) + 1) ... (k n (m„ — 1) + 1) — 1) -dimensional 
convex hull of the points corresponding to the columns of the 
Boolean matrix M, which form a subset of the vertices of the 
(mi) 1 . . . (m„) " -dimensional unit hypercube. Recently Davis- 
Stober (2009) developed a statistical theory for testing the hy- 
pothesis that a vector of probabilities P (not necessarily of the 
same structure as in LFT) belongs to a convex polytope <£ 
against the hypothesis that it does not. Under certain regular- 
ity constraints he derived the asymptotic distribution (a convex 
mixture of chi-square distributions) for the log maximum likeli- 
hood ratio statistic 

max PeP L(P|N) 

—2 log — — — , 

5 max P L(P|N) 

where N is the vector of observed absolute frequencies, com- 
prised of the numbers of occurrences of (l\,...,l n ;j\,...,j n ) in 
the case of LFT. The likelihoods L(P|N) are computed using 
the standard theory of multinomial distributions. This theory 
has been "test-driven" on the polytopes related to the transitiv- 
ity of preferences problem (Regenwetter, Dana, & Davis-Stober, 
2010, 2011). A Bayesian approach to the same problem is pre- 
sented in Myung, Karabatsos, & Iverson (2005). 

Other approaches readily suggest themselves. One of them 
is to use the known theory of L(P|N)/maxpL(P|N) to com- 
pute a confidence region of possible probability vectors P for a 
given empirical vector N. The hypothesis of selective influences 
is retained or rejected according as this confidence region con- 
tains or does not contain a point P that passes LFT. Resampling 
techniques is another obvious approach, e.g., the permutation 
test in which the assignment of empirical distributions to differ- 
ent treatments is randomly "reshuffled" so that each distribution 



generally ends up assigned to a "wrong" treatment. If the pro- 
portion of the permuted assignments whose deviation from the 
LFT polytope does not exceed that of the the observed estimate 
of P is sufficiently small, the hypothesis of selective influences 
can be considered supported. 

Little is known at present about the computational feasibil- 
ity and statistical properties of these and similar procedures. In 
particular (this also applies to Davis-Stober's test), we do not 
know their statistical power for different locations of the true 
vector of probabilities outside the convex polytope described by 
MQ = P, Q > 0. Nor do we know how the effect size, a mea- 
sure of deviation of P from the polytope, should be computed 
optimally. All of this will have to be investigated separately. 



4. CONCLUSION 

Selectiveness in the influences exerted by a set of inputs upon 
a set of random and stochastically interdependent outputs is a 
critical feature of many psychological models, often built into 
the very language of these models. We speak of an internal rep- 
resentation of a given stimulus, as separate from an internal rep- 
resentation of another stimulus, even if these representations are 
considered random entities and they are not independent. We 
speak of decompositions of response time into signal-dependent 
and signal-independent components, or into a perceptual stage 
(influenced by stimuli) and a memory-search stage (influenced 
by the number of memorized items), without necessarily assum- 
ing that the two components or stages are stochastically inde- 
pendent. 

In this paper, we have described the Linear Feasibility Test, 
an application of the fundamental Joint Distribution Criterion 
for selective influences to random variables with finite numbers 
of values. This test can be performed by means of standard lin- 
ear programming. Due to the fact that any random output can 
be discretized, the Linear Feasibility Test is universally appli- 
cable, although one should keep in mind that if a diagram of 
selective influences is upheld by the test at some discretization, 
it may be rejected at a finer or non-nested discretization (but not 
at a coarser one). Both the Joint Distribution Criterion and the 
Linear Feasibility Test, although new in the behavioral context, 
have their direct analogues in quantum physics, in dealing with 
the problem of the existence of a classical explanation (one with 
non-contextual, local hidden variables) for outcomes of noncom- 
muting measurements performed on entangled particles. The 
discovery of these parallels promises to enrich and facilitate our 
understanding of selective influences. 
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